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Much current network analysis is predicated on the assumption that important biological networks 
will either possess scale free or exponential statistics which are independent of network size allowing 
unconstrained network growth over time. In this paper, we demonstrate that such network growth 
models are unable to explain recent comparative genomics results on the growth of prokaryote 
regulatory gene networks as a function of gene number. This failure largely results as prokaryote 
regulatory gene networks are "accelerating" and have total link numbers growing faster than linearly 
with network size and so can exhibit transitions from stationary to nonstationary statistics and 
from random to scale- free to regular statistics at particular critical network sizes. In the limit, 
these networks can undergo transitions so marked as to constrain network sizes to be below some 
critical value. This is of interest as the regulatory gene networks of single celled prokaryotes are 
indeed characterized by an accelerating quadratic growth with gene count and are size constrained 
to be less than about 10,000 genes encoded in DNA sequence of less than about 10 megabases. 
We develop two "nonaccelerating" network models of prokaryote regulatory gene networks in an 
endeavor to match observation and demonstrate that these approaches fail to reproduce observed 
statistics. 
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I. INTRODUCTION 



The difficulty of developing fully scalable technologies 
which can be equally applied to both very small and very 
large systems explains much of the current fascination 
with network analysis. This field examines how grow- 
ing networks can display stationary (size independent) 
scale free or exponential statistics which are unchanging 
over vast size ranges, and this field will naturally focus 
on the very large and obvious networks possessing read- 
ily obtainable statistics such as the Internet, the World 
Wide Web and movie databases. However, there is an 
entire class of networks equally important to human so- 
ciety, technology and biology which possess nonstation- 
ary (size dependent) connectivity statistics and which are 
thereby forced to undergo structural transitions as they 
grow sometimes so severe as to limit growth entirely — 
for a review see lj. The resulting limited size of these 
networks makes them less obvious but does not decrease 
their relevance. 

In particular, prokaryote gene regulatory networks ex- 
ploiting homology based (sequence specific) interactions 
will display nonstationary or "accelerating" statistics 
where the link number per node grows linearly with net- 
work size (so total link number grows quadratically with 
network size), so these networks will be inherently con- 
strained to have sizes less than about 20,000 genes 0. 
In fact, all prokaryotic gene numbers and genomes are 
indeed of restricted size (less than about 10,000 genes 
with genomes of between 0.5 and 10 megabases Q), in 
contrast to the genomes of multicellular eukaryotes (with 
for humans, about 30,000 genes and a genome of about 
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3 gigabases 0,0). 

The rapidly expanding field of network analysis, re- 
viewed in [2, Q , has provided examples of networks ex- 
hibiting "accelerating" network growth where link num- 
ber grows faster than linearly with network size 0, E|- 
For instance, the Internet appears to grow by adding 
links more quickly than sites though the relative change 
over time is small and the Internet appears to remain 
scale free and well characterized by stationary statistics 
|ll| . Similarly, the number of links per substrate in the 
metabolic networks of organisms appears to increase lin- 
early with substrate number |T^ . the average number 
of links per scientist in collaboration networks increases 
linearly over time [H Q E E E3 , and languages ap- 
pear to evolve via accelerated growth 0. Even social 
networks take on their small world characteristics only 
when the network is large enough — in small towns every- 
one knows everyone else so social networks are accelerat- 
ing and exhibit a transition to small world statistics only 
as individual nodes saturate their connectivity limits |19| . 
Accelerating networks are more prevalent and important 
in society and in biology than is commonly realized 0. 

A "probabilistic" accelerating model of prokaryote 
regulatory gene networks has been developed in Ref. 
|20|. This involved the use of probabilistic links to 
allow arbitrarily rapid acceleration rates, two distinct 
classes of nodes where "regulators" can source outbound 
regulatory links to regulate other nodes (both regula- 
tors and non-regulators) while "non-regulators" cannot 
source outbound links, directed links from regulators to 
regulated nodes, and distinct connectivity distributions 
describing the long-tailed and scale-free distribution of 
outbound link number per regulator and the compact 
and exponential distribution of the inbound link number 
per node. The resulting model satisfactorily matched ob- 
servable parameters. However, this success is meaning- 
less if similar results can be achieved via nonaccelerating 
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network models. In this paper, we will show that the two 
simplest nonaccelerating network models fail to explain 
either the observed quadratic growth of regulator num- 
ber with genome size or the detailed statistics pertaining 
to the E. coli genome. 

In Section|n]we canvass the available literature to char- 
acterize the statistics of prokaryote gene regulatory net- 
works. This then allows the construction of two nonac- 
celerating network models in Section IIIII where we use 
the continuous approximation and simulations to analyze 
network statistics allowing comparison to observation. 
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FIG. 1: Double-logarithmic plot of regulatory protein num- 
ber (R) against total gene number (N g ) for bacteria (circles) 
and archaea (triangles), adapted from Ref. The log- 

log distribution is well described by a straight line with slope 
1.96±0.15 (r 2 = 0.88, 95% confidence interval indicated), cor- 
responding to a quadratic relationship between regulator num- 
ber and genome size. The inset shows the same data before 
log-transformation Dashed lines show the best linear fit 
to the data. 



II. OVERVIEW OF PROKARYOTE GENE 
NETWORKS 

Ongoing genome projects are now providing sufficient 
data to usefully constrain analysis of the gene regulatory 
networks of the simpler organisms. Ref. 21] first noted 
quadratic growth in the class of transcriptional regulators 
(i?) with the number of genes N g in bacteria with the 
observed results 



at1.87±0.13 
JV 9 ' 



N. 



2.07±0.21 



transcriptional regulation 



two component systems 



R oc < 



^y|.03±0.i3^ transcriptional regulation 



(1) 



7V2.16iO.26 
9 



, transcriptional regulation. 



Here, the top two lines refer to different classes of regula- 
tors while the bottom two lines are the results of a cross- 
checking analysis of two alternate databases, and quoted 



intervals reflect 99% confidence limits [2l|. Ref. Q ana- 
lyzed of 89 bacterial and archeael genomes to determine 
the relations 
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In all cases, the limits reflect 95% confidence levels, and 
for completeness, the data is shown in Fig. ^ The ob- 
served quadratic growth implies an ever growing regu- 
latory overhead so there will eventually come a point 
where continued genome growth requires the number of 
new regulators to exceed the number of nonregulatory 
nodes, and based on this, Ref. Q predicted an upper 
size limit of about 20,000 genes, within a factor of two 
of the observed ceiling. A number of other papers have 
noted the faster than linear growth of regulator number 
with genome size. In particular, it was noted that larger 
genomes harboured more transcription factors per gene 
than smaller ones and that regulators form an in- 
creasing proportion of all genes as genome size increases 

mm. 

Prokaryotes typically group their DNA encoded genes 
in operons, co-regulated functional modules of average 
size 1.70 genes each in E. coli which value we treat as 
typical though in reality, operon size decreases slightly 
with genome size pEj . E. coli regulatory proteins affect 
an average of about 5 operons with this distribution be- 
ing long tailed [13 so the majority of regulators affect 
only one operon though some regulators (CRP) can af- 
fect up to 71 operons or 133 genes (2?]]. More recent 
estimates show this transcription factor — CRP, a global 
sensor of food levels in the environment — regulating up 
to 197 genes directly and a further 113 genes indirectly 
via 18 other transcription factors [2^|. (To observe the 
long tailed distribution, see Fig. 2 of Ref. j27j and Fig. 
4 of Ref. 0.) 

The number of inputs taken by an operon is character- 
ized by a compact exponential distribution with a rapidly 
decaying tail so the majority of regulated operons are 
controlled by a single regulator while very few regulated 
operons are controlled by four, five, six or seven regu- 
lators [2(| l28|. T he average number of inputs in E. 
coli is about llTjH, 1.5 [27|, or 1.6. In addition, 

31.4% of E. coli transcription factors regulate other tran- 
scription factors |28|. while 37.7% of non-autoregulatory 
cascades in E. coli are of length two, 52.5% are three-level 
cascades, and 9.8% are four-level cascades psl. 



III. NONACCELERATING PROKARYOTE 
NETWORK MODELS 

We extend the gene network model of Refs. [53, [27j 
to construct two nonaccelerating network models of 
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prokaryote regulatory gene networks. Prokaryotes typ- 
ically pack their N g genes into a lesser number of N = 
N g /g co- regulated operons where we assume that oper- 
ons contain exactly g a = 1.70 genes. Of the existing 
operons, O r are regulated operons and O u = N — O r are 
unregulated operons. Of the total number of operons, 
there are R regulatory operons whose regulatory interac- 
tions are directed links from regulatory operons to regu- 
lated operons. Under the assumption that there is only 
one regulatory gene per regulatory operon, the observed 
linear relation of Eq. [21 becomes 

R = cN g = cg N. (3) 

In nonaccelerating network models, the number of links 
per regulator is constant so consequently, the total num- 
ber of links must increase linearly with network size, giv- 
ing 

L = IN. (4) 

Here, the value for / will be approximately cg D = 0.0935, 
but the exact relation must be derived from the details 
of the implemented model. 




FIG. 2: An example statistically generated E. coli genome 
using the later results of the one-parameter model, where 
(for convenience only) operon nodes numbered ni, ■ ■ ■ , njv are 
placed sequentially counterclockwise on a circle in their his- 
torical order of entry into the genome. The filled points on 
the outer circle locate regulators and have radius indicating 
the number of outbound regulatory links. The open points on 
the middle circle locate regulated operons and have radius in- 
dicating the number of inbound regulatory inputs. The arrows 
in the inner circle show all directed regulatory links. 

Following Ref . pi} , each regulatory link between nodes 
is directed, and characterized by two distinct distribu- 
tions describing respectively the placement of the heads 



and tails of each link. Only a relatively few nodes are 
regulatory, and of these, the number of outbound link 
tails per regulatory node are described by a size depen- 
dent long-tailed distribution with average about (t) ~ 5. 
Such a long-tailed distribution requires that link tails be 
preferentially attached to an existing regulatory operon, 
and this requirement places restrictions on the gene du- 
plication processes assumed by the model — see Ref. [2(j 
for details. In contrast to the relatively small number of 
regulatory nodes, all nodes can themselves be multiply 
regulated by inbound links. Further, the many used and 
unused promotor region binding sites broadly sample the 
space of possible binding sites so only a small fraction of 
nodes will be regulated by any one regulator. As a result, 
the number of inbound link heads per node is described 
by a size dependent exponential distribution with a low 
average of (h) ss 1.5 as typically results from the ran- 
dom or non-preferential attachment of inbound links to 
operon promotor sequences. 

We suppose that the operon network grows by the se- 
quential addition of numbered nodes for 1 < k < N, 
and that at network size k, node rii (1 < i < k) has Uk 
outbound tails and hit inbound heads. We do not model 
the many trials of potential genes over many generations 
and merely include fixated genes in our count — that is, 
drifting sequence is not counted as part of the fixated 
genome. This further implies the sequence of established 
nodes is under severe selective constraint and unable to 
drift so consequently new links cannot be added between 
existing nodes. 

For clarity, Fig. [^preempts later calculations (from the 
one-parameter model) and depicts a statistically gener- 
ated version of an E. coli genome where nodes are placed 
sequentially counterclockwise in a circle (for convenience 
only). Alternative genome models may be distinguished 
by the age distribution of regulators, regulated operons 
and their link numbers, and these are indicated in this 
figure. In particular, Fig. [3 shows a highly nonuniform 
distribution of both regulators and outbound link num- 
bers and of regulated operons and inbound link numbers 
with gene age. (These age- independent distributions are 
in marked contrast to those generated by accelerating 
models of regulatory gene networks po|-1 

A substantial proportion of the gene regulation net- 
work of prokaryotes is enacted via homology dependent 
interactions as when sequence specified protein transcrip- 
tion factors bind to specific promoter sequences. Nat- 
urally then, regulators will form more links in larger 
genomes than in smaller genomes 0, l2fl |- Such inter- 
actions lead immediately to accelerating models of gene 
regulatory networks |20j, making it difficult to propose 
plausible physical mechanisms restricting regulators to 
form the same probable number of initial links indepen- 
dent of genome size and thereby implement a nonaccel- 
erating model. However, the purpose of this paper is to 
fully evaluate nonaccelerating gene regulatory network 
models, and we here presume that such physical mecha- 
nisms exist (without detailing them). 
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A. One-parameter model 

For our first model, we assume that on entry into the 
genome, each new node n& can form a total of up to m 
outbound regulatory links with all the nodes ri\ , . . . , rik 
with each individual link forming with probability p, and, 
provided that sufficient regulators already exist, up to a 
total of m inbound regulatory links each with probabil- 
ity p from some subset of the existing regulators chosen 
according to preferential attachment. (For consistency, 
the probable number of inbound distinct regulatory links 
to node rik (« mp) must be less than the probable num- 
ber of existing regulators (cg N), satisfied when genomes 
have size TV > mp/cg a Hence, the respective prob- 

abilities that the initial number of heads hf-k = J or the 
initial number of tails t^k = j for node rik is 



P(j) 



(5) 



with the proviso that all the inbound links can only 
be added to node rik if there is a sufficient number 
of regulators among the nodes ni, . . . ,nk- The aver- 
age number of inbound and outbound links is identical, 
(tkk) — (h>kk) — m P independent of network size. The 
addition of node rik and its links will increase the prob- 



able number of heads attached to earlier nodes 



for 



1 < j < (k — 1) so hjk > hjj, while the probable number 
of tails outbound from node rij increases tjk > tjj if and 
only if that node is regulatory with tjj > 0. 

The average number of links in a network of size iV 
nodes is then 



L = 2mpN = IN, 



(0) 



taking account of both heads and tails. Under the as- 
sumption that regulators can only be created on entry 
to the genome 20] , the distribution of regulators at any 
time is specified by the distribution P(j) for tkk so the 
probability that node rik is a regulator is 1 — P(0, k). 
For a network of N nodes, the predicted total number of 
regulators is then 



R 



£[i-(i-p) m ] 



k=l 



[l-(l-p) m ]N = c 9o N. 



(7) 



The bottom line shows the expected behaviour for the 
number of regulators in the respective limits p — » giving 
R — » 0, and p — > 1 giving R —> N. Comparison to 
the observed Eq. [3] provides the noted constraint which 
reduces the number of free variables by one to justify this 
as a one-parameter model. Combining Eqs. |21IIIl an d[7| 
and noting that m is integral gives 



/ = 2mp = 2m 1 — (1 — cg ) 



a/r. 



m = 1,2, . 



(8) 



which establishes the infinite number of possible mod- 
elling choices 

(m,p,l) = (1,0.0935,0.187) 



(20,0.00490,0.196) 
(40,0.00245,0.196) 



(9) 



The values of the link formation probability p over this 
range of m values suggest overly short average promo- 
tor binding site lengths of between — log 4 p e [1.7,4.3] 
bases. These values are unreasonably low though we 
are restricted from exploring arbitrarily large values for 
to by our desire to develop a nonaccclcrating network 
model — obtaining a promotor sequence length of about 
6 requires m > 400, and such large m values effectively 
implement an accelerating network model as every regu- 
lator can effectively explore links to every operon in even 
large genomes. For modelling purposes, we set m = 20 
and p = 0.00490 to give 



I = 0.196. 



(10) 



This high link formation rate leads to the heavy density of 
regulators and regulated operons in Fig. [5] The average 
number of links per regulator using Eqs. □andUHlis then 
approximately L/R = l/(cg ) = 2.10, a constant for all 
genomes which is reasonably close to the observed value 
of 5 for E. coli El. 



1. Random distribution of regulated operons — / 

The distribution of link heads for all nodes (with pos- 
session of a link head designating a regulated node), 
can be straightforwardly calculated under the assump- 
tion that the tkk ~ rnp = 1/2 new tails added with node 
rife are randomly distributed across the k existing nodes 
so on average, each existing node receives I /2k additional 
inbound links. The continuous approximation [29l . l30l . l3lT ] 
for links randomly distributed over k existing nodes de- 
termines the number of inbound head links for node rij 
according to 



dhjk 
dk 



tkk 

k 



I 

2k' 



(11) 



This can be integrated with initial conditions hjj « 1/2 
at time j and final conditions tjjy ~ 1/2 at time N to 
give 



-LjN 



1 +ln 



(12) 



The number of inbound regulatory links per node is then 
dependent on the age of each node. Integration of these 
link numbers over all node numbers j gives the required 
total number of links as in Eq. |SJ This distribution 
suggests that the oldest node rti for the E. coli genome 
with N = 2528 nodes will possess an average of h\N = 
0.87 inbound regulatory links while the most recent node 
77 nn will possess an average of h^N = 0.098 inbound 
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regulatory links — see the age dependent distributions of 

Fig.m 

The very useful continuum approach is not entirely ac- 
curate when applied to these nonaccelerating networks, 
and it is necessary to check later results using fuller 
derivations of the underlying joint probability distribu- 
tions. In particular, the probability that by time N, node 
rife has received an initial hkk = 3 ' £ {0, rn} inbound 
links each with probability p, and subsequently received 
jk € {0,m.} inbound links from itself each with probabil- 
ity p/k, as well as jk+i £ {0, m} inbound links from node 
nk+i each with probability p/(k + 1), and so on until it 
receives j'jv £ {0,m} inbound links from node tin each 
with probability p/N, is 



P{j,3k,3k+i, ■ ■ -,3n) 

N 



n—k 



i- p - 



(13) 



The average number of inbound links for node rik is then 



(j + 3k +---+Jn) 



(1 + 1/fc + . . . + 1/N) 



1 



in i j 



(14) 



as found by the continuum approach. (Later results will 
not match so closely.) 

The number of links per node is monotonically de- 
creasing with node number as even though all nodes re- 
ceive the same number of initial links on average, earlier 
nodes have a longer time to accumulate more links than 
later nodes. This distribution contains information about 
both node connectivity and node age and so approxi- 
mates genome statistics (simulated or observed) when all 
this information is available. However, it is usually the 
case that node age information is unavailable necessitat- 
ing calculation of connectivity distributions that are not 
conditioned on node age. This effectively requires bin- 
ning together all nodes irrespective of their age to obtain 
a final link distribution. We can use the continuum ap- 
proach for monotonically decreasing link numbers with 
node age [29T I30L l3lj to discard the often unknown age 
information via 



1 



n 



H(k,N) = - J dj5(k-h jN ) 



1_ / dh jN \ 1 
'N { dj 



at \j=j(k,N)), (15) 



where j(k,N) is the solution of the equation k = hjpj. 
For our case with the constraint k = l(l+ln(N/j))/2, the 
final distribution of link heads absent age information is 



a minimum of 1/2 links, this distribution is normalized, 
Jz/2 H(k, = 1> an d nas average (k) = f™ 2 kH(k, N) — 
I. The expected proportion of nodes Ph(k) possessing k 
inbound links is then obtained by integrating the con- 
tinuous distribution of Eq. 1161 over appropriate ranges 
[1/2, 1/2] or [k - 1/2, k+ 1/2] to obtain 

fl-e^VO, k = 

Ph(k) = { (17) 
[ 2sinh(l/0e( 1 - 2fc /0 j > 0. 

Consequently, the distribution of inbound link numbers 
for regulated nodes (i.e. those with k > 0) is Ph{k)/[l — 
Ph(0)), or 



P,(fc) = (e 2/i -l) 



-2k /l 



(18) 



which again is normalized to unity. 

These distributions for the number of inbound link 
heads per node and per regulated node permit the calcu- 
lation of the number of unregulated operons O u via cither 
P(0, . . . , 0) (from Eq. H3J or P h (0) (from Eq. [TJJ. Thus, 
the total number of unregulated nodes is respectively 



0„. = 



Eti(i-prn„= fe [i-a 

Tvri _ e (i-vm . 



(19) 



The top line here shows the expected behaviour with 
p — ► 1 giving O u — > and p — > giving O u — > N — IN as 
required. The second line derived using the continuum 
approximation fails to exhibit the desired dependency on 
link number as I — > demonstrating that care must be 
taken in using this approach. Using the more accurate 
top line, the number of regulated nodes is then approx- 
imately O r = N — O u « IN, so in turn, the number of 
inbound links per regulated node is L/O r = 1. A direct 
calculation of the average number of inbound links for 
regulated operons using the distribution of Eq. ^] gives 
= Efeli kP r {k) = 1/[1 - exp(-2/0] = l.OOOO^close 
to the value of 1.5 or 1.6 observed in E. coli j2^,|23,|23]. In 
addition, the average number of inbound regulatory links 
per operon (for all operons) is (k) = L/N = I = 0.196. 
The predicted distribution of inbound links for regulated 
operons (Eq. IT%|l can be compared to that observed in 
the E. coli network of size N = 2528 operons [2^ , and is 
shown in Fig. |31 The overly rapid decay of the calculated 
distribution poorly approximates the compact exponen- 
tial distribution observed for E. coli shown in Fig. 2(d) 
of Ref. [2i| and of Fig. 5 of Ref. [H leading to an un- 
derestimation of the numbers of regulated operons with 
2 or more inputs — essentially no regulators are predicted 
to have 2 or more inputs for genomes of size N = 2528 
operons. 



H(k,N) = -e 



(l-2fc/Z) 



(16) 



2. Scale-free distribution of regulator operons — / 



showing an exponentially rapid decrease in the num- 
ber of probable links. As every node initially receives 



At time k, the hkk ~ 1/2 inbound links associated with 
node rik have their tails preferentially attached to existing 
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P r (k) 



FIG. 3: The predicted proportions P r (k) of the regulated oper- 
ons of E. coli taking multiple regulatory inputs for genomes of 
any size. This distribution poorly approximates that observed 
for E. coh with N = 2528 operons in Fig. 2(d) of Ref. 
and of Fig. 5 of Ref. Jkl\l. 



regulatory nodes rij with probability proportional to the 
number of existing regulatory links for that node at time 
k, i.e. tjk- Using the continuous approximation |29l l3Ct 
1,3 1| , the rate of growth in outbound link number for node 



is then approximately 

dtjk 
~dk 



hkk- 



Tjk 



Jo tjk dj 



(20) 



The denominator here is a probability weighting to en- 
sure normalization and is the total number of outbound 
links for all nodes at network size k. Following Q, we 
can evaluate the denominator using the identity 



d f k 



k g 

dk 



—tjk dj 



tkk- 



This can be evaluated using Eq. EH noting tkk 
1/2 giving 



0_ 

dk 



tjk dj = I, 



(21) 

hkk ~ 

(22) 



which can be integrated determining the denominator of 
Eq. EDI to be 



tjk dj = Ik. 



(23) 



This is in agreement with Eq. Substituting this value 
into Eq. USUI gives 



dtjk 
~~dk~ 



tjk 

2k' 



(24) 



Finally, this can be integrated with initial conditions 
tjj w 1/2 at time j and final conditions tjN at time N to 
give 



/ (N 

tjN — — I — 



3 



(25) 



Because we are now considering outbound links, we must 
take account of our use of two classes of distinguish- 
able nodes, regulators and non-regulators, by allowing 
for the known distribution of regulators with node num- 
ber over the genome. The average link number per node 
at node rij (Eq. I25J) equates to the product of the av- 
erage number of link tails per regulator at node rij , de- 
noted t r (j, N), and the average number of regulators per 
node at node rij, denoted p(j). This latter density is 
p(j) = dR{j)/dj — cg a by Eq. [7| so by definition, wc 
have 



tjN = t r {j,N)p(j) 1 



giving 



U(j,N) 



I fN 
2cg a V j 



(26) 



(27) 



Again we find a monotonically decreasing number of links 
per regulator with node number or age so older nodes are 
more heavily connected — see Fig. [21 Our treatment here 
effectively duplicates previous results for networks adding 
a constant deterministic number of links per node [7j . 

As usual, we again use the continuum approach for 
monotonically decreasing link numbers with node age 
(Eq. [lland noting k = l(N / 'j) {1/2) /2cg ) [HHHE! to 
discard the often unknown age information in the t r (j, N) 
distribution to obtain the outbound link distribution 



(28) 



which is normalized over the range [ko — l/2cg rs 
1.05, oo), as J fc °°T(fc,iV) = 1. In turn, the expected pro- 
portion of regulators Pt(k) possessing k links is then ob- 
tained by integrating the continuous distribution of Eq. 
ESI over appropriate ranges [fc , 3/2] or [k — 1/2, k + 1/2] 
to obtain 



Pt(k) 



1 ( 3cg a 



1 



(29) 



(4*/-l) 2 ' k>l. 



As required, this is normalized to unity. The average 
number of outbound links per regulator is, using the 
continuous distribution T(k,N), (k) = / fc °° kT(k,N) = 
l/cg = 2.09 and numerically calculated to be (k) = 1.98 
using the Pt(k) distribution (complementing previous es- 
timates following Eq. I10|) each of which compares well to 
the observed value of 5 in E. coli [26j. 

However, the very rapid (cubic) decrease in probable 
link numbers means that these distributions have diffi- 
culty in reproducing the distributions observed in E. coli. 
The expected outbound link distribution appears in Fig. 
0] showing a long-tailed and scale free distribution with 
probabilities scaling roughly as Pt(fc) oc fc -3 . The Pt(k) 
distribution shows that a full 51% of regulators have only 
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FIG. 4: The predicted proportion of regulatory operons Pt(k) 
regulating k different operons for a simulated E. coli genome 
with N = 2528 operons. As expected, most regulators regulate 
only one other operon, though a small number of regulators 
can regulate more than 10 operons. This distribution poorly 
approximates the observed proportions for E. coli in Fig. 2(c) 
of Ref. \2nj and Fig. 4 of Ref. \2f\l . and predicts the probable 
existence of only one E. coli regulator possessing link numbers 
in the range between [20, oo] links. 



one link, while 83% have two or fewer links, and 91% have 
three or fewer links. In particular, the expected number 
of regulators with k links is Pt(k)R with the number of 
regulators R obtained from Eq. (or from observation) . 
For E. coli with N = 2528 operons (25|, this predicts the 
probable existence of only one E. coli regulator possess- 
ing link numbers in the range between [20, oo] links. This 
poorly approximates the connectivity of E. coli where 
many regulators regulate more than 20 operons includ- 
ing the global food sensor CRP which regulates up to 197 
genes directly |28| . In fact, Eq. [23 with j — 1 predicts 
that the most heavily connected node in E. coli has only 
about 53 downstream links. 



3. Cascades and regulatory islands — / 

Nonaccelerating networks have stationary statistics 
which are independent of network size. In particular, 
the proportion of the network present in islands of nodes 
of various sizes connected by regulatory links is indepen- 
dent of genome size. As prokaryote regulatory networks 
likelyconsist of functionally distinct regulated modules 
[27I [32l | with a marked absence of regulatory cycles (at 
least in E. coli prl I27I [28j ) . any network model must be 
able to adequately reproduce the statistics of regulatory 
islands and cascades. 

The proportion of transcription factors which control 
downstream regulators is 



Prr(N) 



1 

R 



N 



5>-(l-pH 



fe=l 



2cg a 



R 

N 

(30) 



Here, the first fraction on the RHS normalizes the pro- 
portion in terms of the number of regulators R (Eq. |7J), 
the first term in the summation is the probability that 
node rifc is a regulator, the second term is the average 
number of regulatory outbound links for this regulatory 
node t r (k, N) at network size N (Eq. I27|) . and the third 
term approximates the probability that these nodes link 
to one of the existing regulators under random attach- 
ment. (If the very first and very last terms are dropped, 
the remaining summation over all nodes of the proba- 
bility that ti/j is regulatory with the stated number of 
links equates to the total number of links in the network 
L = IN. This is the more accurate version of the calcu- 
lation leading to Eq. 1271 1 Hence, the proportion of regu- 
lators which control transcription factors is independent 
of network size and equals 19.6%. This ratio compares 
reasonably well with that observed in E. coli where Ref. 
[28| noted 31.4% regulate other transcription factors. 

As the proportion of regulators of transcription fac- 
tors rises, the probable length of regulatory cascades in- 
creases. In fact, the proportion of regulators taking part 
in a regulatory cascade of length n > 1 is 



Pn = (1 - P rr )P r r . 



(31) 



This equation can be obtained from a tree of all binary 
pathways which at each branching point either terminate 
with probability (1 — P rr ) or cascade with probability 
P rr . As such, the probable cascade length is negligible 
when the proportion of regulators controlling regulators 
is small P rr <C 1 but can become large as P rr itself in- 
creases. The calculated lengths of regulatory cascades 
can be compared to those in E. coli where 37.7% are of 
length two, 52.5% are three-level cascades, and 9.8% are 
four-level cascades [28] . As one- level or autoregulatory 
interactions are not included in this observation, the pre- 
dicted proportions for E. coli are p n — p n /(l — p\) with 
P rr = 19.6% giving 80% two-level cascades, 16% three- 
level cascades, 3% four-level cascades, 1% five-level cas- 
cades, and so on. It is seen that the theoretical predic- 
tions overestimate the proportion of two-level cascades 
and underestimate the number of three-level cascades 
probably because of selection pressures not included in 
the model, while other calculated values closely approxi- 
mate those observed. 

We note that this model is entirely unable to explain 
the high proportion of autoregulation observed in E. coli 
with various estimates that 28.1% |33|. 50% [2(| and 
46.9% [23 of regulators are autoregulatory. The pre- 
dicted proportion of autoregulators is approximated by 
replacing the very last fraction (R/N) in Eq. [201 by the 
term 1 /N giving the probability that a self-directed link 
is formed, leading to the expected autoregulatory propor- 
tion « l/{cg N) w 0.08% for E. coli. This failure likely 
reflects the action of selection processes promoting spa- 
tial rearrangements of entire regulons on the genome and 
the internal shuffling of genes and promotor units. Such 
reorganizations of duplicated gene regions (presumably 
shuffling genes and promotor regions) have been com- 
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mon in E. coli allowing for instance, spatial regulatory 
motifs whereby the promotors of colocated (overlapping) 
and often co-functional operons transcribed in opposing 
directions can interfere |34j . 

We now turn to consider the size of the largest con- 
nected island in growing prokaryote gene networks featur- 
ing directed links whose tails are preferentially attached 
to regulators and whose heads are randomly distributed 
over all existing nodes. For simplicity, we define an is- 
land to consist of all nodes which are linked regardless of 
the orientation of all links and so effectively treat links 
as being undirected. This is because a regulator can po- 
tentially perturb every node downstream to it includ- 
ing those nodes downstream of other regulators and so 
can modify the regulatory effects of other regulators — 
essentially, if the downstream effects of different regula- 
tors eventually intersect, we count these regulators in the 
same island. (Other definitions of islands could be used.) 
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FIG. 5: The total number of discrete disconnected islands i a ii, 
the number of islands with respectively two (ii), three (13) 
and four (ii) members (left hand axis), and the simulated 
((si)) and predicted (si) size of the largest island measured 
as a proportion of nodes for various genome sizes (right hand 
axis). 

The growth of the largest island can be both directly 
simulated and calculated under the continuum approx- 
imation [2(j (though this simple approach is indicative 
only and is quite sensitive to for instance, the assumed 
average size of external islands). The dominant (but 
not sole) mechanism by which island si can grow is for 
the newly added node to either (a) be a regulator 
(with probability [1 — (1 — p) m ] = cg ) and establish 
an outbound regulatory link to some existing node in s\ 
(with probability s\/k) while at the same time accept- 
ing a regulatory link (with probability cg ) from a node 
in a different island Sj^i (with probability (fc — si)/fc), 
or (b) accept an inbound regulatory link (with probabil- 
ity ego) from a regulator in island s± (with probability 
si/k) while establishing a regulatory link (with proba- 
bility cg ) to some node in a different island Sj^i (with 
probability (k — si)/k). (Here, we assume that regulators 
are uniformly distributed over islands and the number of 



links within an island scales with the size of the island 
to crudely model preferential attachment.) The result 
is that island si grows by the size of the second island 
(sj^i). Altogether, the rate of growth in the size of island 
si is then 

dsi , , 2 si[fc-si] 

— = 2{cg ) - (s j¥1 ). (32) 

For initial conditions, we assume that a first link appears 
when the genome has 1/cgo « 11 nodes (si(ll) = 2). A 
consistent solution for this equation appears with island 
size growing linearly with genome size, si = afc, with 
a = 1 — 1 / [2(cg ) 2 (s j^i)] under the assumption that suf- 
ficient small islands are created to ensure (sj^i) remains 
a constant. Simulations show the average size of outside 
islands to be very closely (sj^i) = 3.31 over a large range 
of genome sizes, though a reasonable match between the- 
ory and simulation requires setting (sj^i) = 50. This is 
reasonable given the approximations made. Fig. [SJshows 
the number of all discrete islands as well as the number of 
islands containing two, three and four components, and 
the predicted and simulated sizes of the largest island ex- 
pressed as a proportion of the total genome size with a 
close match between theory and simulation. This figure 
suggests that the E. coli genome of N = 2528 operons 
should possess a giant component containing about 4% 
of all nodes or about 100 operons. This can be com- 
pared to the observed figure where about 300 operons of 
the examined regulatory and regulated operons (but not 
including unregulated and nonregulatory operons) can 
be loosely grouped into 3-6 "dense overlapping regulons'' 
or DORS of about 50 operons each while the remaining 
operons appeared as disjoint systems with most contain- 
ing 1-3 operons but some containing up to 25 operons 
[26| . The constant proportion of the genome taken up by 
the largest island, and the constantly growing number 
of discrete islands means that this network architecture 
suffers no maximum size limit. As a result, this approach 
is unable to explain the upper size limit observed in the 
evolutionary record. 

B. Two-parameter nonaccelerating prokaryote 
network model 

The above one-parameter model combined the prob- 
ability of forming a link p and the maximum number 
of links m to give the probable number of regulators 
formed cg Q . A two-parameter nonaccelerating model 
can be constructed by delinking these variables so that 
the probability of being a regulator is given directly by 
p —> r = cg Q leaving the number of links established m 
as a free parameter. This gives the number of regulators 
as R = cg Q N = rN. We assume that every regulator rik 
gains exactly tkk — m outbound links on entry to the 
genome which are randomly distributed as inbound link 
heads over all existing nodes. The average number of 
initial outbound links per regulator is then m while the 
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average number of outbound links per node is rm. These 
outbound link tails must be balanced by uniformly dis- 
tributed inbound link heads so consequently, we assume 
that all nodes on entry to the genome receive inbound 
regulatory links distributed according to 



1 



(33) 



giving the required average of inbound links per node 
of (hkk) — rm. The average number of links is L = 
2rmN taking account of both heads and tails. Hence, the 
number of outbound links per regulator is L/ R « 2m, so 
setting m — 2 allows a close fit between this model and 
the value of 5 observed in E. coli. This sets L = 2rmN = 
0.374iV giving the number of inbound links per node as 
L/N = 2rm = 0.374. The values of the link formation 
probability r suggest an overly short average promotor 
binding site length of — log 4 r = 1.71 bases. 



1. Random distribution of regulated operons — 77 

With respect to the distribution of inbound regulatory 
links, the two-parameter model does not differ in any 
material respect from the earlier one-parameter model 
as in both cases links are uniformly distributed over all 
nodes. However, the link formation probability differs in 
each approach, so all of the results of Eqs. 1111 to H^l can 
be used with the changes p — » r = cg 0l m — 20 — * m = 2, 
and I = 0.196 -> 2rm = 0.374. 

Consequently, the distribution of inbound regulatory 
heads over all nodes is 



I + h, ( £ 



(34) 



again monotonically decreasing with node age. This dis- 
tribution suggests that the oldest node ri\ for the E. coli 
genome with N = 2528 nodes will possess hiN = 1-65 in- 
bound regulatory links while the most recent node hjy n 
will possess hpfjy = 0.187 inbound regulatory links — see 
the age dependent distributions of Fig. H3 

Following the previous derivation, the distribution of 
inbound link numbers for regulated nodes (i.e. those with 
k > 0) is 



P r (k) = (e 1 



Jrrn 



l 



-k/rm 



(35) 



which again is normalized to unity. As previously, the 
number of regulated nodes is approximately O r — N — 
O u w 2rmN, so in turn, the number of inbound links 
per regulated node is L/O r = 1. A direct calcula- 
tion of the average number of inbound links for reg- 
ulated operons using the distribution of Eq. 123 gives 
( k ) = E^Li kP r {k) = l/[l-exp(-l/rm) = 1.0047, close 
to the value of 1.5 or 1.6 observed in E. coli prllHl^. 
The predicted distribution of inbound links for regulated 
operons (Eq. I35|) can be compared to that observed in 



the E. coli network of size N = 2528 operons [25j, and 
is shown in Fig. [7] Again, the overly rapid decay of 
the calculated distribution poorly approximates the com- 
pact exponential distribution observed for E. coli shown 
in Fig. 2(d) ofRef. and of Fig. 5ofRef. [H leading 
to an underestimation of the numbers of regulated oper- 
ons with 3 or more inputs — essentially no regulators are 
predicted to have 3 or more inputs for genomes of size 
N = 2528 operons. 




FIG. 6: An example statistically generated E. coli genome us- 
ing a two-parameter constant growth model using the same 
settings as in Fig. ^ Note that the distributions of regula- 
tors and of regulated operons over the genome are uncorre- 
cted with age, while the numbers of inbound and outbound 
regulatory links are strongly correlated with age. 



P r (k) 



FIG. 7: The predicted proportions P r (k) of the regulated oper- 
ons of E. coli taking multiple regulatory inputs for genomes of 
any size. This distribution poorly approximates that observed 
for E. coli with N = 2528 operons in Fig. 2(d) of Ref. 
and of Fig. 5 of Ref. \26 
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2. Scale-free distribution of regulator operons — 77 

As previously, the rate of growth in outbound link 
number for node rij is approximately 



dk 



2rmk 



(36) 



Here, the denominator on the right hand side is the ex- 
pected number of existing links in a network of size k 
nodes. Noting initial conditions tjj = rm, and hkk = rm, 
we have 



jN 



1/2 



(37) 



As previously, this is the density of outbound regulatory 
links per node which equates to the density of outbound 
regulatory links per regulator times the density of reg- 
ulators per node. As this latter density is uniform over 
the genome and equal to R/N = r, then the density of 
outbound links per regulator is 



N 

t r (J,N) = m[ — 



1/2 



(38) 



Again, this distribution is monotonically decreasing with 
node age so older nodes are more heavily connected — 
see Fig. With the additional degree of freedom offered 
by the independent parameter m, this distribution shows 
the most heavily connected regulators having around 100 
links in E. coli (with j = 1, m = 2, and N = 2528). 

The often unknown age information in the t r (j, N) dis- 
tribution can be discarded using the continuum approach 
(noting k = m{N / j)^ 1 / 2 ^) to obtain the outbound link 
distribution 



T{k,N) 



2m 2 



(39) 



which is normalized over the range [to, oo) as 
T(k, N) — 1. In turn, the expected proportion of 
regulators P(k) possessing k links is then obtained by 
integrating the continuous distribution of Eq. 1391 over 
appropriate ranges [m, 5/2] or [k— 1/2, k+ 1/2] to obtain 



Pt(k) = 



i-m\ k = 2 



32m 2 (^fciLiyi , k > 2. 



(40) 



Here, k > 2 as the choice m — 2 means that the minimum 
number of links that a regulator can possess is two. As 
required, this is normalized to unity. The average number 
of outbound links per regulator is, using the continuous 
distribution T(k,N), (k) = j°° kT(k,N) = 2m = 4 and 
numerically calculated to be (k) — 3.9 using the Pt(k) 
distribution each of which compares well to the observed 
value of 5 in E. coli j2(J. 

Again, the very rapid (cubic) decrease in probable link 
numbers means that these distributions have difficulty in 
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FIG. 8: The predicted proportion of regulatory operons Pt(k) 
regulating k different operons for arbitrary gene networks. 
As expected, most regulators regulate only one other operon, 
though a small number of regulators can regulate more than 50 
operons. This distribution poorly approximates th e observed 
proportions for E. coli in Fig. 2(c) of Ref. ]2djj and Fig. 
4 of Ref. and predicts the probable existence of one E. 

coli regulator possessing link numbers in each of the respective 
ranges between [70, 99] links, and between [100, oo) links. 



reproducing the distributions observed in E. coli [25, 28]. 
The expected outbound link distribution appears in Fig. 
[S] showing that 36% of regulators have two links, while 
67% have three or fewer links, and 80% have four or fewer 
links. In particular, the expected number of regulators 
with k links is P t (k)R with R = rN. For E. coli with N = 
2528 operons [25|], this predicts the probable existence 
of only one E. coli regulator possessing link numbers in 
each of the respective ranges between [70, 99] links and in 
the range [100, oo) links. This poorly approximates the 
connectivity of E. coli where many regulators regulate 
more than 20 operons including the global food sensor 
CRP which regulates up to 197 genes directly [28|. 



3. Cascades and regulatory islands — II 

The proportion of transcription factors which control 
downstream regulators is 



Prr(N) 



1 



JV 

fe=l 
2rm. 



m 



R 

N 



(41) 



Here, the derivation follows that of Eq. |201 Again, the 
proportion of regulators which control transcription fac- 
tors is independent of network size and equals 37.4%, 
which compares well with the 31.4% observed in E. coli 

m. 

As previously, the proportion of regulators taking part 
in a regulatory cascade of length n is p n — p n /(l — Pi) 
with P rr = 37.4% giving 63% two-level cascades, 23% 
three-level cascades, 9% four-level cascades, 3% five-level 
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m\N 


3,000 


6,000 


9,000 


12,000 


15,000 


1 


0.061 (85) 


0.011 (346) 


0.019 (531) 


0.016 (706) 


0.013 (888) 


2 


0.277 (69) 


0.281 (125) 


0.288 (175) 


0.283 (239) 


0.290 (303) 


3 


0.447 (21) 


0.436 (32) 


0.448 (41) 


0.453 (52) 


0.450 (67) 


4 


0.554 (4) 


0.542 (4) 


0.549 (7) 


0.550 (9) 


0.548 (11) 


5 


0.622 (1) 


0.610 (2) 


0.618 (2) 


0.620 (1) 


0.619 (2) 



TABLE I: The relative size of the largest island component and the total number of islands (in brackets) for networks of 
various sizes N and for different choices for the initial number of links per node or regulator m. 
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FIG. 9: The total number of discrete disconnected islands i a ii, 
the number of islands with respectively three (i-j ), four (14) and 
five (i^) members (left hand axis), and the simulated ({si}) 
and predicted (si) size of the largest island measured as a 
proportion of nodes for various genome sizes (right hand axis). 
We note that the choice m = 2 ensures there are no two 
member islands. 



cascades, 1% six-level cascades, and so on. As previously, 
these ratios overestimate the proportion of two-level cas- 
cades and under-estimate the proportion of higher level 
cascades in E. coll with 37.7% two-level cascades, 52.5% 
three-level cascades, 9.8% four-level cascades 28]. 

The size of the largest connected island is again ex- 
pected to occupy a constant proportion of the genome 
regardless of size. An equivalent derivation to that of 
Eq. 1321 gives the rate of growth in the size of island S\ as 



~dk 



2(rm) 



2 si[fc - Si] 
k 2 



(42) 



For initial conditions, we assume that a first link appears 
when the genome has 1/r w 11 nodes giving si(ll) = 3 
as the choice m — 2 ensures there are no two member 
islands. As previously, a consistent solution exists with 
island size growing linearly with genome size {s\ = ak) 
with a = 1 — l/[2(rm) 2 (sj^i)] under the assumption that 
sufficient small islands are created to ensure (sj^i) re- 
mains a constant. Simulations show the average size of 



outside islands to be very closely (sj^i) = 4.17 over a 
large range of genome sizes, while a reasonable match be- 
tween theory and simulation requires setting (sj^i) = 20, 
which is reasonable given the approximations made. Fig. 
[5] shows the number of all discrete islands as well as the 
number of islands containing three, four, and five compo- 
nents, and the predicted and simulated sizes of the largest 
island expressed as a proportion of the total genome size 
with a close match between theory and simulation. This 
figure suggests that the E. coli genome of N = 2528 oper- 
ons should possess a giant component containing about 
30% of all nodes or about 460 operons which overesti- 
mates that observed [2(j. Again, the constant propor- 
tion of the genome taken up by the largest island, and 
the constantly growing number of discrete islands means 
that this network architecture suffers no maximum size 
limit. As a result, this approach is unable to explain the 
upper size limit observed in the evolutionary record. 

The two-parameter model has been developed with the 
setting m = 2 to best match the observed number of links 
per regulator. However, a setting m = 3 provides at least 
as good a match, and it is possible that choosing alternate 
settings for the initial number of links per regulator (m) 
and per node (rm) might improve the fit to the data. 
Table[I]shows the relative size of the largest island and the 
total number of islands for simulated genomes of different 
size and for different choices of m It is clear that choices 
to > 3 overestimates the size of the largest regulatory 
islands while the choice m = 1 gives a poor fit to the 
observed number of regulatory links per regulator. 



IV. CONCLUSION 

In this paper, we developed two probabilistic nonac- 
celerating network models for the growth of prokaryote 
regulatory gene networks. These models complement the 
accelerating network model presented in Ref. [2(| allow- 
ing a comparison of these alternate approaches. 

Each of the nonaccelerating models presented here 
faces considerable difficulties in providing a plausible 
physical mechanism justifying a nonaccelerating regula- 
tory model, and fails to consider any additional steric or 
logical limitations on combinatoric control at any given 
promotor. Further, these approaches are unable to ex- 
plain the observed quadratic growth in prokaryote reg- 
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ulator number with increasing genome size displayed in 
Fig. ^ This mismatch between predicted and observed 
numbers of regulators is also reflected in the overly short 
expected promotor sequence lengths in each model. Fur- 
ther, the linear growth in regulator number with genome 
size effectively means that these networks are becom- 
ing relatively more and more sparsely connected with 
growth — the desired maximum number of possible links 
grows as ./V 2 so the relative density of links goes as 
L/N 2 oc 1/N — + as N becomes large. This decrease 
in relative connection density means that nonaccelerat- 
ing networks suffer their own inherent size constraints 
as complex networks operate poorly when sparsely con- 
nected. 

We further compared each model to observed results 
for E. coli, and achieved reasonable matches for the aver- 
age connectivity of the long tailed distribution of outgo- 
ing regulatory links (approximately 5) and the average of 
the exponential distribution of incoming regulatory links 
(approximately 1.5). However, the distributions them- 
selves were either overly lightly connected (model one) 
or decayed overly rapidly leading to a distinct under- 
representation of highly connected nodes compared to 
the E. coli distributions (models one and two). Each of 
the nonaccelerating models was able to reasonably match 
the observed proportion of regulators controlling regula- 
tors (approximately 31.4%) and in turn, the probable 
length of regulatory cascades. Lastly, the first of the 
nonaccelerating models was able to roughly reproduce 
E. coli statistics on the numbers of discrete regulatory 
islands, though the second model overestimated the size 
of the largest discrete regulatory island. Because of the 
size independent statistics of these nonaccelerating mod- 
els, neither approach displays structural transitions at 



any critical network size and thus face difficulties in ex- 
plaining the prokaryote size and complexity limitations 
evident in the evolutionary record. 

Our approach in this paper (and in Ref. 20] ) is unable 
to explain the high proportion of autoregulation observed 
in E. coli j2|| and this failure likely points to selection for 
genome reorganizations leading to spatial arrangements 
of operons allowing joint regulation |34j which is not in- 
cluded in this model. Further, this approach does not 
include selection pressures ensuring that similarly reg- 
ulated islands or modules share common functionality 
[26j , or other regulatory mechanisms influencing both the 
transcription and translation of transcription factors in- 
cluding micro- RN As and other chemical mechanisms and 
mediators (see for instance [85^1. 

The accelerating and nonaccelerating models of 
prokaryote gene networks differ most markedly in their 
predictions for the age dependency of the distribution of 
inbound and outbound regulatory links. It would be in- 
teresting to obtain information on the correlation (if any) 
between age and link number for different prokaryotes to 
properly distinguish these approaches. 

We conclude that viable models of prokaryote regula- 
tory gene networks are likely to be accelerating in nature. 
This is important as much current network analysis is 
predicated on the assumption that only nonaccelerating 
networks are relevant to society or biology due to their 
unconstrained sizes and constant statistics. However, 
such assumptions make it very difficult to explain the 
size limitations displayed by prokaryotic gene networks 
in the evolutionary record. Subsequently, it is likely that 
viable models of eukaryotic regulatory networks will be 
accelerating and will incorporate computationally com- 
plex technologies j^E E]], El3 ■ 
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